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One  of  the  most  fundamental  problems  in  vision  is  segmentauon;  the  way  in  which  parts  of  an 
image  are  perceived  as  a  meaningful  whole. 

Recent  work  has  shown  how  to  calculate  images  of  physical  parameters  from  raw  intensity  data. 
Such  images  are  known  as  intrinsic  images,  and  examples  are  images  of  velocity  (optical  flow), 
surface  orientauon,  occluding  contour,  and  disparity.  The  principal  difficulty  with  intrinsic  images  :s 
that  each  by  itself  is  generally  underconstraincd;  they  can  only  be  computed  in  parallel  with  each 
ocher  and  with  the  use  of  parameters  obtained  through  segmentation. 

While  intrinsic  images  are  not  segmented,  they  are  disunctJy  easier  to  segment  than  the  original 
intensity  image.  If  parts  of  these  images  are  organized  in  some  way,  this  orgam/auon  can  be 
detected  by  a  general  Hough  transform  technique.  Networks  of  feature  parameters  are  appended  to 
the  intrinsic  image  organization.  Then  the  intrinsic  image  points  are  mapped  into  these  networks. 
This  mapping  will  be  many-to-one  onto  interesting  parameter  values.  This  basic  relationship  is 
extended  into  a  general  representation  and  control  technique  with  the  addition  of  three  main  ideas: 
abstracuon  levels;  sequential  search;  and  tight  coupling.  These  ideas  are  a  nucleus  of  a  theory  of 
low-level  and  intermediate-level  vision.  This  theory  explains  segmentation  m  terms  of  highly  parallel 
cooperative  compuiauon  among  intrinsic  images  and  a  set  of  parameter  spaces  at  different  levels  of 
abstraction. 


The  preparation  of  this  paper  was  supported  in  part  bye  the  Defense  Advanced  -Research 
Projects  Agency,  monitored  by  the  ONR,  under  Contrac^N0O0l4-78-C-Ol64 N00014-80-C-0197/ 
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1.  Overview 

One  of  the  most  troublesome  puzzles  in  vision  is  how  parts  of  an  image  are  seen  as  a 
meaningful  whole  or  segment.  This  is  known  as  the  segmentation  problem.  The  ambiguous  use  of 
segment,  which  means  part,  to  denote  a  whole,  arises  from  the  fact  that  a  segment  is  an 
intermediate  component  in  a  descripuon  which  relates  an  object  with  an  image.  From  the  viewpoint 
of  the  object  descripuon,  the  segment  is  a  part.  From  the  viewpoint  of  a  group  of  image  points  with 
common  properties,  the  segment  is  a  whole. 

Parts  of  an  image  are  seen  as  a  segment  if  the  corresponding  physical  object  has  common 
physical  properties,  or  features.  For  example,  if  a  connected  component  of  the  image  has  a  single 
color,  say  red,  then  it  may  be  seen  as  a  segment.  The  patch  of  red  arises  from  the  physical  object's 
surface  reflectance.  Usually  there  are  not  one  but  several  features  which  have  die  same  spatial 
registration.  For  example,  an  object  may  be  moving,  red,  and  a  cube.  Figure  la  shows  this  case. 
Segmentation  is  more  difficult  when  features  are  not  spatially  registered.  Figure  lb  shows  a 
multicolored  cube.  Which  feature  should  be  the  most  compelling,  the  color  or  the  geometric  lines 
indicating  the  cube?  In  the  genera!  case  this  answer  depends  on  the  goals  of  the  perceiver.  Another 
common  problem  occurs  when  an  object  is  occluded  (Figure  lc);  a  theory  of  low-leve  vision  must 
be  able  to  explain  how  an  object  is  seen  as  a  segment  when  the  features  are  only  partially  registered 
or  incomplete.  Real  image  data  is  also  noisy  and  many  segments  are  only  perceived  owing  to  the 
combination  of  weak  evidence  of  several  features.  The  evidence  may  be  so  weak  that  each  feature,  if 
viewed  in  isolation,  would  be  uninterpretable. 

Figure  1 

We  develop  the  nucleus  of  a  theory  of  low-leve!  and  intermediate-level  vision  which  explains 
the  above  aspects  of  segmemauon  in  terms  of  massively-parallei  cooperative  computation  [Rosenfeld 
et  al„  1976;  Zucker,  1976;  Marr,  1979]  between  two  groups  of  networks.  One  group,  intrinsic  images 
[Harrow  and  Tenenbaum,  1978|,  can  be  computed  primarily  :n  terms  of  local  constraints.  The  other, 
termed  a  feature  space,  can  be  computed  primarily  in  terms  of  global  mappings  from  intrinsic 
images  to  feature  space.  Feature  space  itself  may  have  many  different  levels  of  abstracuon.  Intrinsic 
images  and  feature  spaces  are  collectively  called  parameter  networks  because  they  both  have  a 
common  organization.  That  us,  the  network  is  an  organization  of  basic  units,  each  represenung 
values  of  a  parucular  parameter.  The  simple  structure  of  units  simplifies  the  control  task  and  also 
makes  the  network  representation  easily  extendable.  The  basic  elements  of  the  theory  are  the 
following. 

1)  The  cooperative  computauon  of  several  intrinsic  images. 

Recent  work  has  shown  how  to  calculate  intrinsic  images  from  raw  intensity 
data.  Examples  are  images  of  velocity  (optical  flow)  [Horn  and.  Schunck,  19SC; 

Ullman,  1977;  1979],  surface  orientation  [Morn  and  Sjoberg,  1978;  Ikeuchi,  1980], 
occluding  contour  [Prager,  1980;  Rosenfeld  et  al.,  1976],  and  disparity  [Marr  and 
Poggio,  1976;  Barnard  and  Thompson,  1979].  Intrinsic  images  can  be  computed 
independently  under  special  conditions,  but  in  general  'hey  are  interdependent. 

Intrinsic  images  are  in  concert  with  die  hypothesis  that  die  visual  system  builds 
many  intermediate  descripuons  from  image  data.  These  descriptions  represent 
important  parameters  such  as  velocity,  depth,  surface  reflectance  explicitly,  since 
in  the  explicit  form  they  are  easier  to  map  into  object  descriptions. 

2)  The  extracdon  of  useful  parameters  from  intrinsic  images. 

If  parts  of  the  intrinsic  image  are  organized  in  some  way,  this  organization 
can  be  detected  by  a  general  Hough  transform  technique  [Duda  and  Hart,  1972; 

Ballard,  1981a;  Render,  1978;  Ohlander  et  a'..,  1979|.  This  is  done  by  describing 
the  organizauon  in  terms  of  parameters  and  then  mapping  die  intrinsic  image 
points  rn'o  parameter  space.  The  trausformauon  will  be  many-to-one  onto 
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parameter  values  which  represent  meaningful  units.  A  major  advantage  of  the 
Hough  transform  is  that  it  is  relatively  insensitive  to  occlusion  and  noise. 

3)  Interactions  involving  several  levels  of  abstraction. 

The  Hough  transform  is  a  way  of  seeing  spaual  information  as  a  unit. 

However,  if  the  unit  has  a  complex  structure  the  mapping  from  space  to  unit  can 
be  unmanageably  complex.  A  way  around  this  is  to  introduce  units  of 
intermediate  levels  of  abstraction  jSabbah,  1981;  Ballard  and  Sabbah,  1981; 

Kender,  1978],  1'his  reduces  a  complex  transform  to  several  simpler  transforms 
between  units  at  successively  higher  levels  of  abstracuon. 

4)  Focus-of-attenuon  mechanisms. 

Visual  focus-of-attenuon  can  be  partly  explained  as  the  conjuncuon  of  two 
mechanisms:  1)  the  use  of  Hough  transforms  to  modify  sensor  input;  and  2)  the 
sequential  applicauon  of  Hough  transforms. 

5)  Coupling  between  intrinsic  images  and  parameters. 

In  general,  intrinsic  images  cannot  be  computed  without  global  parameters. 

At  the  same  time,  these  global  parameters  are  what  we  mean  oy  seeing  parts  of 
the  intrinsic  image  as  a  segment.  In  these  cases  the  intrinsic  unage  and 
parameters  are  said  to  be  tightly  coupled:  although  each  cannot  be  computed 
independently,  they  can  be  computed  simultaneously  (Ballard,  1981b;  1981c!. 

We  re-emphasize  that  our  interest  is  low-level  vision.  Thus  in  item  (4)  above,  focus  of  attenuon  is 
interpreted  in  a  narrow  sense;  visual  features  which  are  clear  can  help  the  recogmuon  of  other 
features  (or  perhaps  direct  eye  movements).  We  do  not  attempt  to  explain  general  plans  and  goals. 

Representations  for  Parameter  Networks 

The  basic  element  of  a  parameter  network  is  a  parameter  node.  A  parameter  node  will 
represent  a  single  parameter  value  and  has  an  associated  confidence.  The  value  is  a  set  of  numerical 
measurements  for  the  node;  the  confidence  is  a  measure  of  their  believnbility.  For  example,  if  there 
is  an  edge  at  (10,10)  with  ortentauon  30°  and  length  5  units,  the  vector  value  of  the  parameter  node 
represenung  the  edge  is  (x,y,0,s)  =  (10,10, 30°, 5).  The  associated  confidence  is  a  measure  of  the 
fuzziness  of  this  esumate.  One  way  a  confidence  may  be  increased  is  if  there  are  nearby  edges  of 
the  same  onenuiuon  which  align.  Thus  in  Figure  2  the  edges  in  (a)  and  (b)  have  the  same  value  but 
we  can  be  more  confident  in  case  (b). 

Figure  2. 

This  paper  assumes  a  very  simple  model;  namely,  collections  of  value  units.  Each  value  unit  is 
connected  to  a  subset  of  oilier  value  units,  and  can  alter  only  those  units.  Underlying  ohvsicui 
principles  determine  the  appropriate  connection  subsets.  The  confidence  updating  is  done  by  non¬ 
linear  relaxation.  The  overall  structure  of  the  paper  is  slanted  towards  abstracUor.s  of  physical 
principles;  however,  we  also  show  how  these  principles  arc  implemented  in  the  networks. 

2.  Intrinsic  Images 

An  intrinsic  image  is  an  image  of  some  important  parameter  that  is  in  registration  with  the 
original  intensity  image  |Barrow  and  Tenenbaum,  1978;  Marr,  1979],  that  is,  each  parameter  ;s 
indexed  by  retinal  coordinates.  For  example,  in  the  velocity  (optical  flow)  image,  one  is  able  to 
compute  at  each  point  in  time  and  for  each  spaual  posiuon  a  local  velocity  vector  v(\,t).  Figure  2 
shows  Horn's  example  for  a  rotating  sphere  [Horn  and  Schunck,  198C|.  Intrinsic  images  may  only  be 
computable  over  cetuun  parts  of  die  image,  and  over  'hose  pans  the  parameters  arc  continuously 


varying.  While  intrinsic  images  are  not  segmented  into  parts  of  objects,  they  are  distinctly  easier  to 
segment  than  die  original  intensity  image.  Other  examples  of  such  images  are  surface  orientation, 
occluding  contour,  and  disparity. 

Figure  3. 

Very  recently  there  has  been  rapid  progress  in  finding  algorithms  for  computing  intrinsic 
images  from  intensity  data.  What  is  remarkable  is  that  each  such  image  type  is  computed  in  the 
same  manner.  Two  constraints,  one  derived  from  physical  principals  and  'he  other  from  a  constraint 
that  die  resultant  images  should  be  locally  smooth,  suffice  to  specify  a  parallel-iterative  algorithm. 
Table  1  shows  this  commonality  but  is  not  an  exhaustive  list  of  approaches.  See  page  2  for 
additional  references. 


Table  1:  Intrinsic  Images 


Parameter 

Physical  Constraint 

Smoothness  Constaint 

Refs. 

Edge  Orientation 

boundaries  are 

nearby  edges 

Prager  1979 

0 

locally  linear 

should  align 

Disparity 

if  x  corresponds 

neighboring  points 

Marr  and  Poggio 

cl 

to  x '  then 

f(x  +  A)  =  f(x'  +A) 

should  have 
similar  disparities 

1976 

Surface  Onent'u. 

f(x)  =  R(0,<p,tfs.<ps) 

V20  =  0 

lkeuchi  1980 

O.t p 

0y qps  is  the  light 
source  direction 

V2<p=0 

Optical  Flow 

df/dt=0 

V2u  =  0 

Morn  and 

u,v 

V2v  =  0 

Schunck  1980 

While  the  above  algorithms  work  well  on  images  which  are  constrained  to  sausfy  the  underlying 
assumptions,  they  may  not  work  in  the  genera!  case.  Almost  always  there  are  free  parameters  or 
boundary  conditions  which  have  to  be  determined  independently. 

2.1  Multi* Resolution  Relaxation  Methods 

One  general  notion  of  "boundary  condition"  is  image  resolution.  Previous  methods  for 
comptiung  intrinsic  images  have  used  a  single  image  resolution,  but  :.n  most  situations  this  is 
unrealistic.  What  is  the  correct  resolution?  At  high  resolution 

*  noise  is  a  factor 

*  convergence  is  slow 

*  basic  assumptions  may  not  hold 

To  see  the  last  point,  imagine  a  surface  with  a  micro-texture.  At  low  resolution  the  surface 
structure  is  blurred  and  simple  reflectance  models  hold,  but  at  high  resolution  the  micros',  rue  lure 
can  render  such  models  useless.  At  low  resolution 

*  noise  is  less  of  a  factor 

*  convergence  is  fast 

*  basic  assumptions  may  not  hold 

The  last  point  arises  from  the  fact  that  most  intrinsic  images  are  computed  from  constraints  which 
assume  local  variations  are  smooth.  With  increasing  grid  resolutions.  these  assumptions  are  '.ess 
likely  to  be  valid. 


Hence  a  conjecture  is  that  there  is  a  range  of  resoluuons  for  which  the  computations  will  be 
valid.  Furthermore,  this  range  is  expected  to  be  spatially  variant.  A  too!  for  exploring  this  conjecture 
is  mulugnd  relaxation  techniques  llSrandt,  1 977 ) .  which  have  proven  very  useful  for  solving 
differenual  equauons.  This  model,  together  with  reasoning  from  physical  first  principles,  should 
allow  the  deiemnnauon  of  image-dependent  grid  resoluuons  for  which  intrinsic  image  computations 
are  valid.  Multigrid  techniques  are  of  course  related  to  pyramids  [Tanimoto  and  I’avlidis,  1975; 
Hanson  and  Rtseman,  I978|. 

2.2  Cooperative  Computation  of  Multiple  Intrinsic  Images 

Intrinsic  images  are  logically  computed  simultaneously.  In  fact,  they  have  to  be;  otherwise  each 
intrinsic  image  us  underdetermined  in  the  general  case.  (Only  on  certain  syntheuc  images  is  the 
compuiauon  well-defined.)  Furthermore,  they  are  highly  interdependent,  parucularly  at  points  of 
disconunuily  (Marrow  and  Tenenbaum,  1978J.  For  example: 

*  intensity  edges  can  be  indicauve  of  depth  discontmuiues.  Thus  the  edge  image 
is  coupled  to  the  disparity  image; 

*  surface  orienuition  is  also  indicauve  of  depth  disconumuty  and  is  thus  related 
to  the  oilier  two;  and 

*  different  objects  which  are  moving  relauve  to  each  other  produce 
disconunuiues  in  the  How  field. 

By  tncorporaung  these  couplings  in  the  intrinsic  image  computations,  one  should  find  general 
cases  where  the  computations  will  converge.  A  separate  issue  is  the  behavior  of  the  coupled 
computauons  in  the  face  of  conflicting  informauon. 

2.3  Intrinsic  Images  at  Different  Levels  of  Abstraction 

The  survey  of  intrinsic  images  (Table  1)  excluded  the  fact  that  intrinsic  images  may  have  fine 
stricture  involving  Severn!  levels  of  abstraction.  In  fact,  it  seems  likely  that  muluple  abstracuon 
levels  are  necessary  m  many  cases.  For  example,  Zucker  [1980|  uses  two  levels  of  abstracuon  in 
eompuung  orienuiuor,  intrinsic  images,  one  for  points  of  high  gradients  and  the  other  for  edge 
segments.  The  computation  of  a  velocity  image  in  3-d  could  involve  three  levels  of  abstracuon: 

*  a  change  defection  level  where  units  are  used  for  variauons  :n  intensity  over 
space  and  ume  AI/Ax',  Al/Ay",  AI/Ai  (primes  denote  retina!  coordinates); 

*  an  optical  Jlow  level  where  units  correspond  to  retinal  velocities 

(u(x',y  ').v(x',y 

*  a  i-,7  Jlow  level  where  units  correspond  to  3-d  velocities 

(vx(x,y,z),vy(x,y,/.),v2(x,y,z)). 

The  feasibility  of  eompuung  the  optical  flow  from  change  measures  has  been  studied  by 
[Barnard  and  Thompson,  1979;  Mrager,  1980;  Horn  and  Schunck,  1980(.  The  feasibility  of  computing 
3-d  fiow  is  explored  in  (Mallard,  1981c). 

2.4  Intrinsic  Images  and  Parameter  Nodes 

Two  models  have  been  used  to  compute  intrinsic  images;  1)  the  value  unit  defined  in  SecUon  1 
(Mrager,  19X0;  Marr  and  I’oggio,  1976);  and  2)  a  variable  unit  [Ikeucii.,  1980;  Horn  and  Schunck, 
!9X0|.  In  tine  first  model  there  :s  a  unit  for  every  value  of  every  variable;  :n  effect  the  representation 
has  only  constants.  Constant  value  units  may  have  outputs  which  are  confidences  between  zero  ant; 
one.  In  tine  second  model,  each  unit  represents  a  variable  which  can  take  on  values  (the  standard 
method  is  to  use  an  array  for  these  units).  The  output  :s  line  value;  there  is  no  explicit  nouon  of 
confidence. 


In  general  the  unit/value  representation  is  sufficient  since  problems  formulated  to  use  variables 
can  be  transformed  into  unit/ value  problems  in  the  following  manner.  Suppose  x,  y,  and  z  sausfy  a 
relation  R(x.y.z)  =  0.  Let  us  use  a  set  of  values  A  for  x,  H  for  y,  and  C  for  z.  Where  a  £  A,  we 
would  like  C(a)  to  be  1  if  there  exist  b  €  B  and  c  €  C  such  that  C(b)  =  1,  C(c)  =  1,  and  R(a,b.c) 
=  0.  To  implement  this  in  a  parameter  network  connect  all  pairs  of  (b,c)  €  BxC  to  a  value  (a)  if 
R(a.b.c)  =  0.  Then  starting  with  initial  confidences,  increment  C( a)  if  there  exist  (b.c)  such  that 
R(a,b,c)  =  0  and  C(b)  +  C(e)  >  some  threshold.  The  individual  vaiues  b  and  c  may  be  treated 
similarly. 

Note  that  the  updating  function  is  nonlinear,  when  the  underlying  physical  relauon  R  is 
nonlinear.  If  the  relauon  R  can  be  linearized  then  the  cooperative  computauons  can  be  shown  to  be 
equivalent  to  linear  programming  [Hinton,  1979J.  The  linear  case  has  also  been  analyzed  by 
|llummel  and  Zucker,  19S0J. 

3.  Parameter  Spaces 

What  does  it  mean  to  perceive  parts  of  an  image  as  a  segment?  In  our  theory,  this  percepuon 
takes  place  if  there  is  a  parameter  space  such  that  each  of  the  parts  can  have  the  same  parameter 
value.  This  general  idea  is  illustrated  by  the  following  examples. 

*  Parts  of  a  color  image  may  be  seen  as  a  segment  if  they  have  the  same  color. 

In  this  case  the  parameter  space  is  a  space  of  colors  and  tine  parts  map  into  a 
common  point  representing  the  common  color. 

*  Parts  of  an  optical  flow  image  may  be  seen  as  a  segment  if  they  are  part  of  a 
rigid  body  that  is  moving.  In  this  case  the  parameter  space  represents  the  rigid 
body  mouon  parameters  of  translauonal  and  rotauonal  velocity  and  pares  of 
the  image  map  into  a  common  point  in  that  space. 

*  Parts  of  edge  and  surface  orientation  images  may  be  seen  as  a  segment  if  they 
are  putt  of  the  same  shape.  This  case  is  more  complicated  as  there  must  exist 
some  internal  representation  of  the  shape.  Given  this  representation,  the 
parameter  space  represents  the  transformauon  (scale,  rotation,  translation) 
from  the  internal  representation  to  the  (viewer-centered)  image  representauon. 

Parts  of  die  image  which  are  seen  as  the  shape  have  common  values  for  these 
parameters. 

A  general  way  of  describing  this  relationship  between  parts  of  an  image  and  the  associated 
parameters  is  the  Hough  transform  |Hough,  1962;  Duda  and  Hart,  1972;  Kimme  et  al„  1975; 
Shapiro,  197X|.  tn  our  losv-levcl  vision  theory,  Hough  transforms  relate  intrinsic  images  and  feature- 
spaces  and  feature  spaces  at  different  levels  of  abstracuon.  If  the  intrinsic  image  parameter  is  a 
vector  (x,a(x))  €  A  and  an  element  of  feature  space  is  a  vector  I)  £  B  then  diere  is  usually  a 
physical  constraint  that  relates  u(x)  and  I),  i.e.,  some  relauon  f(a,l>)  such  dial 

fla.M  =  0. 

The  space  A  represents  all  possible  intrinsic  image  values.  A  particular  intrinsic  image  is 
described  by  a  set  of  values  {a^}  where  =  a(x^).  Now  the  set  U>.S  is  only  consistent  with 

certain  elements  m  die  space  B,  owing  to  the  constraint  imposed  by  the  relauon  f.  This  physical 
constraint  can  be  exploited  in  die  following  manner.  For  each  a^  we  can  compute  die  set 

B^  =  { l»  ( and  <  6^5 

Define  H(b)  as  the  number  of  times  the  value  1)  occurs  in  U^B^.  11(1))  is  the  Hough  transform 
from  the  space  a  to  die  space  h  and  is  the  number  of  points  in  intrinsic  image  space  which  are 
consistent  with  the  parameter  value  b.  11(h)  makes  the  most  sense  when  the  values  both  (a(x),x)  and 
!>  are  discrete.  Hence  tile  constant  above  is  related  to  the  quantr/adon  in  the  space  K.  li  is  also 
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best  normalized  by  defining  C(l>)  :=  HOO/X^IKb).  In  that  ease,  the  value  C(l>)  can  stand  for  the 
confidence  tiiat  die  segment  with  feature  value  b  is  present  in  the  image. 

Concerning  the  implementation  of  Hough  transforms  in  networks,  B^  C  B  is  the  subset  of  H 
units  to  which  the  unit  a^  hould  be  connected  in  the  network.  A  separate  Umax  unit  is  needed  for 
normalizauon. 

The  Hough  transform  need  not  originate  from  intrinsic  image  space  but  can  be  defined 
between  any  two  spaces  A  and  B  as  long  as  there  is  some  relation  ITa.b)  =  0  for  a  €  A  and  b  €  I). 
To  avoid  describing  the  above  computauons  in  detail,  we  can  use  a  shorthand  notation  for  Hough 
transforms.  Each  transform  can  be  described  as  the  triple 

<a,b,f> 

where  the  necessary  computauons  are  implicit.  Note  that  the  order  of  a  and  h  is  important  in  the 
notation;  in  general,  <a,b,f>  is  not  equivalent  to  <b,a,f>. 

As  a  very  simple  example  of  a  Hough  transform,  we  describe  how  a  patch  of  red  in  an  image 
may  be  seen  as  a  unit.  For  this  to  happen,  an  association  is  made  between  the  spaually  conuguous 
points  in  the  image  and  the  particular  value  "red"  in  a  parameter  space  of  colors.  There  are 
essentially  three  dimensions  to  color  space.  Although  r-g-b  is  widely  used  in  computer  applications, 
humans  seem  to  use  an  opponents-process  basis  (r-g,  y-b,  white-black)  [Hurvich  and  Jameson,  1957). 
One  (admittedly  overly  simplified)  way  of  transforming  from  (r.g.b)  space  to  opponents  color  space 
is  to  use  the  following  linear  transformauon: 

rg  1-2  1  r 

yb  =  -1  -1  2  g 

Lbw_  J  1  1  _  b  (3.1) 

1'hus  the  Hough  transform  is  given  by  <a,b,f>  where 
a  =  (r(x,y),  b(x,y).  g(x,y)) 

b  =  (rg,  yb,  bw) 

and 

f  =  Ta  -  b 

where  T  is  die  matrix  defined  by  Eq.  3.1. 

For  a  red  spot  on  a  green  background  there  are  two  values  of  color  parameters  which  have 
high  values  for  C(b):  red  and  green.  The  rest  of  color  Hough  transform  has  low  values.  Figure  4 
shows  this  idea,  which  has  been  used  by  |Hanson  and  Riseman.  1978;  Ohlander  et  al„  1979), 
applied  to  a  color  image. 

Figure  4:  Hanson  and  Riseman's 
Segmentauon  in  Color  Space, 

To  show  that  intrinsic  images  and  parameter  spaces  may  be  related  in  more  complicated  ways, 
we  briefly  describe  an  example  of  how  a  specific  two-dimensional  shape  is  detected  by  specifying  a 
Hough  transformauon  from  edge  space  (local  linear  edges  detected  with  a  standard  edge  detector)  to 
a  four-dimensional  parameter  space  consisung  of  local  origin  coordinates,  rotation  and  scale.  Both 
the  color-space  example  and  this  one  have  the  same  solution  at  an  abstract  level.  In  each  case  there 
is  a  transformation  from  intrinsic  image  space  to  parameter  space  that  segments  the  image.  In  the 
first  case,  points  in  the  color  image  have  the  same  color  values.  In  the  second  case,  points  in  die 
edge  image  have  the  same  shape  parameter  values.  In  fact,  almost  a!!  segmentauon  problems  can  be 
characterized  in  this  fashion. 


Table  2  shows  some  other  Hough  transforms. 
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Table  2:  Hough  Transforms 

Intrinsic  Image  Hough  Transform 

Opucal  Flow  •  Heading 

*  Rotauon  of  3d  rigid  body 

Surface  Orientauon  *  Illumination  angle 

*  Shape 

Occluding  Contour  *  Shape 

*  Surface  orientation 

Disparity  •  Segments  of  constant  disparity 

Color  •  Segments  of  constant  color 

The  two-dimensional  shape  example  shows  the  general  feature  of  Hough  transforms:  if  the 
algorithms  are  completely  parallel,  the  space  required  is  exponenual  in  the  number  of  parameters. 
This  can  lead  to  immense  space  requirements.  For  example,  consider  an  eight-parameter  space  of 
100  discrete  values  for  each  parameter.  T  he  total  number  of  parameter  nodes  required  to  represent 

the  space  is  100^1  Fortunately  this  problem  can  generally  be  alleviated  by  delecting  groups  of 
parameters  sequentially.  The  example  of  2-d  shape  detection  is  reconsidered  in  Seeuon  3.2  to 
illustrate  this  extremely  powerful  decomposition  technique. 

3.1  Detecting  Two-Dimensional  Shapes 

Two-dimensional  shapes  can  be  found  from  a  primal  sketch  |Marr,  1978]  by  encoding  the  shape 
information  in  constraint  tables  |l)allard,  1981a].  Consider  die  case  where  an  object  being  sought  has 
no  simple  analyuc  form,  but  has  a  particular  silhouette.  Suppose  for  the  moment  that  the  object 
appears  in  the  image  with  known  shape,  orientation,  and  scale.  (If  oneniauon  and  scale  are 
unknown,  they  can  be  handled  as  addiuonal  parameters,  as  we  will  show.)  Now  pick  a  coordinate 
system  for  the  silhouette  and  draw  a  line  to  the  boundary  from  the  coordinate  system  origin.  At  the 
boundary  point  we  can  compute  the  gradient  direction  and  length  and  store  the  reference  point  as  a 
function  of  this  information.  Thus  it  is  possible  to  precompute  the  location  of  the  reference  point 
from  boundary  points  given  the  gradient  angle.  The  basic  strategy  of  the  Hough  technique  for 
shapes  is  to  compute  die  loci  of  points  in  parameter  space  from  an  edge  in  image  space  and 
increment  those  points  in  an  array.  Figure  5  shows  the  relevant  geometry. 

Figure  5:  Geometry  for  the  Hough  Transform. 

In  this  case  (he  reference  point  coordinates  (xc.yc)  are  the  only  parameters  (remember,  rotauon  and 
scaling  have  been  fixed).  Thus  if  we  encounter  in  an  image  an  edge  point  (x,y)  with  gradient 
orientauon  (<p)  and  span  (1)  we  know  that  the  possible  reference  points  are  at 

(x  +  r(<p,!)cos(a(<p,l)),y+r((p,l)sin(a((p,l))) 
and  so  on. 

Thus  we  can  describe  the  generalized  Hough  algorithm  as  follows: 


Generalized  Hough  for  Shapes 

Step  0.  Make  a  table  for  the  shape  to  be  located  like  that  shown  in  Figure  2. 

Step  1.  Form  an  array  of  possible  reference  points 
n(xcmin;xiW  yc^niyc™,)  initialized  to  zero. 

Step  2.  For  each  edge  do  the  following: 

Step  2.1.  Compute  <p(x),l(x) 

Step  2.2a.  Calculate  the  possible  centers,  i.e.,  foreaeh  table  entry  for  (<p,l) 
compute 

xc  :=  x  +  r(<p,!)cos(a(<p,!)) 
yc  :=  y  +  r((p,l)sin(a(<p,!)> 

Step  2.2b,  Increment  the  array 
Il(xc.yc)  :=  I I(xc.yc) -i- 1 

Step  3.  Possible  locations  for  the  shape  are  given  by  maxima  in  the  array  H. 

In  terms  of  our  Hough  transform  notation,  the  transform  is  of  the  form 
<(<p(x,y),l(  x,y),x,y), (xc.yc), T> 

where  T  is  the  constraint  relation  between  (ip(x,y),'.(x,y),x,y)  and  (xc.yc)  shown  by  Figure  5.  Also  the 
inner  loop  of  the  algorithm  (Step  2.2)  computes  13^  given  an  edge  (qp^.l^).  The  outer  loop  (Step  2) 
computes  UvU^.  The  results  of  using  this  transform  to  detect  a  shape  are  shown  m  Figure  6. 
Figure  6a  shows  an  image  of  shapes,  lhe  R-table  has  been  made  for  the  middle  shape.  Figure  6b 
shows  die  Hough  Transform  for  the  shape,  :.e„  H(xc.yc)  displayed  as  an  image.  Figure  6c  shows  Lhe 
shape  given  by  the  maxima  of  H(xc.yc)  overlaid  on  top  of  die  image. 

Figure  6:  Applying  'he  Generalized  1  lough  Technique. 

(a)  Syntheuc  image,  (b)  1  lough  Transform 
A(xc,yc)  for  middle  shape. 

What  about  the  parameters  of  scale  and  rotation,  s  and  0?  These  are  readily  accommodated  by 
expanding  die  accumulator  array  and  doing  more  work  in  the  incrementation  step.  Thus,  m  Step  I, 
the  accumulator  array  is  changed  to 

I*(xcmin:x<-max  ,ycmin-ycmax'smin-smax'^min -^max^ 
and  Step  2.2a  is  changed  to 

foreaeh  table  entry  for  (<p,l)  do 
foreaeh  s  and  0 
xc  :=  x+r(ip,l)  s  cos(a(<p,I)  +  0) 
yc  :=  y  +  r(cp,l)  s  sin(a(<p,I)+0) 

Finally,  Step  2.2b  is  now 

H(xc,yc,s,0)  :=  Il(xc,yc,s.0)  +  1  TT 

Now  the  transform  is  given  by  <(f(>(x,y).!(x,y),x,yS.(xc,yc,s,tp),T'  >  where  T'  incorporates  the  rules 
for  computing  s  and  <p.  Notice  that  this  algorithm  is  no^onaiiy  parallel  since  all  the  mcrementauons 
are  independent,  and  that  the  space  required  is  exponential  in  the  number  of  parameters. 

3.2  Feature  Space  Decompositions 

In  the  example  of  Section  3.1.  a  parucular  shape  is  found  by  a  nolionaily  parallel  transform 
from  edge  space  (<j>(x,y).i(x.y).x,y)  to  a  fotir-dimensicr.a!  shape  space  (xc.yc, s ,0).  However,  time  can 
be  traded  for  space  by  finding  groups  of  these  parameters  set iiicnial.y.  The  advantage  of  Lhe 
sequential  search  is  '.he  dimensionality  of  die  computation  at  each  stage  is  much  less  than  'he 
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single  computation  involving  all  of  the  parameters  simultaneously  (Ballard  and  Sabbah,  198!|.  For 
example,  where  Nx.  N  and  S/j  are  the  sizes  of  the  spaces  (xc.yc),  s,  and  0  respectively,  searching 
for  a  particular  shape's  parameters  in  the  order  (s M)  and  (xc.yc)  requires  parameter  space  equal  to 
NsNfl  +  Nx  *nsltat'  of  NsNflNx.  The  Hough  transform  for  die  individual  group  is  sul!  nouonally 
parallel,  so  the  ume  needed  in  the  sequential  transform  is  only  proportional  to  die  number  of 
parameter  groups.  In  the  shape  example,  the  number  of  groups  :s  two. 

To  see  how  scale  and  orientation  can  be  detected  independently,  consider  a  table  that  encodes 
the  orientation  of  the  edge  with  respect  to  the  silhouette’s  coordinate  system.  For  example,  for  each 
edge  (<p,l)  encode  the  angle  necessary  to  rotate  the  edge  clockwise  so  dint  it  is  parallel  to  the  x-axis. 
Using  tins  table,  the  algoridim  is  as  follows: 

Hough  Algorithm  for  Orientation  and  Scale: 

Step  0.  Make  an  orientation  table  as  a  function  of  p  and  1. 

Step  1.  Form  an  array  of  possible  scale- orientauon  pairs  H(0:2ir,Smjn:Smax). 

Step  2.  For  each  edge  do  the  following: 

Step  2.1  Compute  <p(x),l(x). 

Step  2.2  Foreach  S  do  the  following: 

(a)  look  up  the  table  entry  a( ip(x),s*I(x)). 

(b)  increment  the  array 
II(o,s)  :=  H(ot,s)+l. 

Step  3.  Possible  orientations  and  scales  are  given  by  maxima  in  the  array  H. 

The  value  of  sequential  searches  through  parameter  space  becomes  even  more  important  in  3-d 
since  this  case  requires  seven  parameters:  three  posiuonal  coordinates:  three  orientation  angles;  and 
a  scale  factor.  The  sequential  Hough-shape  transform  extends  readily  to  3-d  and  has  been  used  to 
detect  polyhedra  [Hallard  and  Sabbah,  1981]  using  the  constraints  of  [Kanade,  1978;  1979|. 

1'he  previous  example  is  for  a  single  shape.  For  N  shapes,  given  that  the  search  is  in  parallel,  a 
size  factor  of  N  is  added  to  the  search  space.  To  cut  down  on  die  impact  of  'Jus  factor  one  needs  a 
shape  taxonomy  like  that  of  Bribiesca  (Bribiesca  and  Guzman,  1979]  where  all  shapes  can  be 
described  as  a  branch  in  a  single  shape  tree.  The  advantage  of  the  shape  tree  is  that  radier  than 
looking  for  all  N  shapes  in  parallel,  die  search  can  be  paruuoned  into  searches  of  spaces  of  size  N,, 
Njj,  Njjjj,  etc.,  where  the  sum  of  these  is  roughly  equivalent  to  iog(N). 


4.  Hierarchies  of  Abstraction  Levels 


The  value  of  using  several  hierarchical  levels  of  abstracuon  in  vision  is  that  '-he  interaction 
between  levels  is  simp!. lied.  This  does  not  mean  that  high-level  dscripuons  cannot  nfluence  low- 
level  descriptions,  or  that  the  enure  computations  are  not  carnet:  of  :n  parallel.  RaJier,  each 
descriptive  level  can  only  influence  nearhy  levels.  In  Sabbah  [198!  the  limitation  is  to  levels 
directly  above  and  below.  Other  levels  are  influenced  indirectly.  The  implication  ror  the  Hoigi; 
transforms,  which  specify  the  constraints  between  levels,  :s  that  the  constrain:  relationships  between 
levels  involve  only  a  few  parameters.  This  is  an  especially  important  feature,  smee  die  space 
required  by  the  Hough  transform  is  exponential  in  the  number  of  parameters,  as  are  the  sets  {H^f. 
Different  levels  of  abstraction  have  been  used  by  [Hanson  and  Riseman,  1978|.  (examples  using  tit e 
Hotigli  transform  may  be  found  in  [Sabbah,  1981;  Ker.der,  I97K-.  Sabbah  uses  four  levels  such  its 
those  shown  in  Figure  7  to  reorganize  origami  world  figures. 


Figure  7. 

I'o  show  an  example  tn  detail,  Render's  technique  for  detecur.g  vanishing  points  in  an  images 
from  oriented  line  segments  [Render,  1978 1  is  described.  Such  line  segments  which  are  part  of  a 
given  vanishing  point  form  a  radial  field  which  emanates  from  the  point.  Different  vanishing  points 
have  different  sets  of  associated  radial  line  segments  (Fig.  8). 

Figure  8. 

This  example  is  interesting  since  the  same  situation  occurs  with  respect  to  optical  low  due  to  pure 
trans'auon.  If  die  objects  in  the  image  are  stationary  with  respect  to  a  translating  observer,  then  the 
flow  vectors  will  be  emanating  radially  from  a  "feeus-of-expanvon"  ’  -  Oil)  in  the  direction  of 
motion.  Objects  translating  with  respect  to  the  observer's  frame  will  produce  their  own  flew 
emanating  from  a  different  FOH  (Fig.  8). 

This  example  involves  two  levels  of  absuacucn.  The  fust  transforms  colmear  edge  segments 
into  points  (represenung  lines).  Radial  sets  of  edge  elements  correspond  to  circles  through  the  origin 
in  line-space.  Thus  the  second,  transformation  :.s  between  circles  in  me-space  to  points  .r.  rad:ai-f c.d 
space. 


The  first  level  is  easy  if  a  (p,U)  line  space  is  used  where 
p  =  x  cos/,/  y  sin//. 

Since  an  edge  element  has  direction  a  (Fig  9),  each  such  element  maps  onto  precisely  one  point  in 
(p.O)  space:  (x  cosa  -+•  y  sina,  a).  Thus  the  Hough  transform,  m  he  notation  of  Section  3,  is: 


<(x,y,a(x,y)),  (p.O),  (0  =  a;  p  =  x  cosa  +  y  sina)>. 

Figure  9. 


Now  maxima  in  C(p,0)  correspond  to  lines  in  the  image.  Also,  radial  lines  will  form  a  circle  of 
local  maxima  in  (/>,//)- space.  To  see  tins  note  that  the  triangle  OFQ  m  Figure  I.b  is  always  a  right 
triangle,  and  therefore  0 0  must  be  the  diameter  of  a  circle.  Note  that  this  circle  is  -unstrained  'o 
go  through  the  origin  so  that  os  diameter  must  be  on  die  line 

p/2  =  a  eos 0  +  b  si n 0 

where  (2a, 2b)  is  die  location  of  the  focus  of  expansion  (or  vanishing  point).  Thus  die  second 
transform  is 


<(p,/)),  (a.b),  (p/2  = 


a  cos//  +  b  sin//)>. 


Implementation  in  Parameter  Networks 


In  our  earlier  definition  of  the  Hough  transform,  we  assumed  that  the  measurements  ak  all  had 
confidence  equal  to  unity.  With  higher  level  of  abstraction  Hough  transforms,  this  may  no  longer  be 
die  case.  This  is  easily  handled  by  keeping  track  of  the  confidences  in  die  set  lik,  i.e., 

Bk  =  {(b.C)|f(ak.b)  <  <SI{  and  C  =  C{ak)}. 

Then  H(b)  is  the  sum  of  the  confidences  associated  with  the  value  b  in  L'kHk. 


5.  Ifocus*of*  Attention 

Previously,  intrinsic  image  to  feature  space  transforms  used  single  Hough  transforms.  We  are 
now  ready  to  tackle  issues  which  arise  wnen  several  Hough  transforms  are  used,  hirst  we  show  that 
multiple  I  lough  transforms  can  be  invoked  m  parallel  to  resolve  die  problem  of  u.iecung  a  unit 
with  multiple  features.  Hus  is  done  via  the  mechanism  of  a  context  Hough  transform  which  is  the 
sum  of  individual  Hough  transforms.  Next,  we  describe  a  focusing  mechanism  which  exploits  the 
fact  that  an  ambiguity  m  one  space  may  be  resolved  in  another.  I  his  technique  allows  die  detection 
of  arbitrarily  fine  detail.  Alienuon  can  be  directed  from  a  unit  to  us  subparts  and  back  again  via  a 
mechanism  termed  sequencing. 

5.1  Spatial  Context 

If  a  unit  has  multiple  spaually  registered  features,  these  can  be  delected  by  applying  two 
different  sets  of  Hough  Transforms.  The  Hough  Transform  defined  in  Secuon  3  is  bottom-up: 
points  in  die  intrinsic  image  space  determine  plausible  sets  of  points  in  feature  space.  The 
complementary  transform  is  top-down:  points  in  feature  space  determine  plausible  sets  of  points  in 
intrinsic  image  space.  Formally,  given  a  set  {I*.}  £  li.  we  compute 

Ak  =  (a  I  l>k  and  fia.h^)  <  <5^} 

H(x)  is  die  number  of  times  the  value  a(x)  occurs  in  L'kAk.  The  mapping  which  defines  H(a)  is 
likely  to  be  one  to  many  and  furthermore,  for  a  given  feature,  different  bks  should  give  rise  to 
disjoint  subsets  of  A.  Owing  to  diis  last  point,  it  is  intuitively  appealing  to  deal  with  I I.,(x)  which  is 
simply  die  sum  of  the  confidences  of  different  values  of  the  parameters  ai.a?,...  which  are  at  the 
same  snaual  location  x,y,  i.e., 

IIa(x)  =  Xa  Il(a.x) 

An  Example 

Consider  again  the  image  of  a  red  spot  on  a  green  background,  where  the  spot  lakes  up  one- 
third  of  the  image  pixels.  Then  the  transform  11(b)  where  I)  =  r.g.b  has  two  peaks  and  is  zero 
everywhere  else,  i.e.,  for  four-bit  color  scale  accuracy 

11(b)  =  l  if  b  =  (0,15,0) 

1/2  if  b  =  (15,0,0) 

0  otherwise 

Now  consider  bj  =  (15,0,0)  and  compute  i!{a,x).  This  is  given  by 

H(a,x)  =  1  if  x  in  spot  and  a  =  RED 

0  otherwise 

A  point  in  A  represents  the  single  color  red  and  so  H(x)  in  this  case  is 

H(x)  =  !  if  x  is  spot 

0  otherwise 


The  transform  1 1{ x)  is  called  the  spatial  context  transform  for  reasons  that  will  become  more 
apparent  when  we  discuss  focus  of  attention.  The  effect  of  this  transform  is  to  place  an  imaginary 
tiller  in  front  of  the  sensors.  In  the  above  case,  only  sensors  that  arc  spatially  registered  with  HMD 
sensors  would  receive  input. 

Multiple  Features 

Now  consider  the  case  where  multiple  features  are  present  in  the  image.  leach  individual 
bottom-up  transforms  for  different  features  can  be  applied  in  parallel  to  compute  Ili(ln), 

fbOb) . Hm(hm)  (f°r  m  feature  spaces).  Now  maxima  in  each  of  these  spaces  can  be  used  to 

compute  individual  top-down  transforms  Hai(x),H.(->(x)l...,II,irn(x).  The  generalized  spatial-context 
transform  H(x)  is  simply  'ire  normalized  sum  of  these  individual  transforms,  :.e„ 

H(x)  =  (1/m)  Hak(x) 

Now  the  value  of  II(x)  at  a  point  X  is  the  fraction  of  the  maximum  number  of  spatially 
registered  features  that  are  present.  High  values  of  1!  correspond  to  spatially  registered  intrinsic 
unage  points  winch  each  have  been  grouped  into  a  unit  by  a  separate  bottom-up  transform.  Thus 
H(x)  represents  a  possible  soluuon  to  the  mulup it- feature  problem. 

5.2  Subspaces  and  Sequencing 

In  our  formalism,  a  segment  in  an  image  is  ideally  represented  as  a  conjunction  of  Hough 
transform  maxima.  Mach  set  of  maxima  corresponds  to  an  organization  with  respect  to  a  given 
modality:  color,  velocity,  etc.  In  the  previous  section  we  showed  how  the  parallel  generation  of  these 
maxima  could  be  used  to  discover  regions  in  the  image  corresponding  lo  mu'.ti-modal  linns. 
Unforuinaiely,  this  technique  will  usually  be  inadequate  because  the  unit  is  not  manifested  as  a 
clear  maxima  in  all  the  modalities.  As  an  example,  consider  a  light-blue,  moving  unit,  against  a 
background  of  other  units,  none  of  which  are  Light-blue,  but  which  are  moving.  In  the  color  space, 
the  unit  is  clearly  revealed;  light-blue  units  have  high  confidence  values  (lug  10), 

Figure  10. 

In  velocity  space,  however,  there  is  no  clear  maximum  owing  to  the  presence  of  other  moving  units. 

The  fundamental  problem  is  that  each  modality  consists  of  a  projection  of  feature  space.  In  the 
high-dimensional  space  consisting  of  the  concatenation  of  all  die  individual  dimensions  of  each 
modality,  each  unit  would  appear  as  a  distinct  maximum.  Che  visual  system  model  is  structureu  to 
examine  only  the  subspaces  of  the  individual  modalities.  The  principal  reason  for  ’his  is  economy; 
lire  space  requirement  increases  exponentially  with  he  number  of  modalities. 

This  problem  can  be  surmounted  if  the  different  parameter  spaces  arc  examined  scquenually. 
First  the  parameter  spaces  arc  examined  for  maxima.  1'he  most  disunet  maxima  is  picked  and  us 
inverse  Hough  transform,  C(x),  is  generated.  This  transform  can  be  usd  to  block  input  from  sensors 
positioned  at  us  low  confidence  values.  To  see  how  this  might  work,  let  us  reconsider  the  previous 
example  of  he  light-blue,  moving  unit.  In  color  space  here  is  a  clear  maximum  corresponding  to 
light-blue.  Tins  value  is  used  to  generate  Cjjg|n.r)[uc(x)  and  block  input  from  all  sensors  hat  are 
not  spaually  registered  with  light-blue  color  input.  The  net  effect  is  hat  :n  velocity  space  here  is 
now  a  clear  maxima  as  input  from  other  units  has  been  blocked. 

5.3  Multiple,  Spatially-Registered  Features 

Sequencing  solves  he  problem  of  building  up  coherent  groups  of  features,  but  has  us 
drawoacks.  For  example,  if  he  "blue,"  "moving,"  "horizontal"  object  were  a  "fnsbee,"  one  would 
like  tlus  percept  to  be  triggered  via  a  Hough-like  transform.  However,  in  he  sequencing  example, 
here  is  initial  evidence  for  all  light-blue  objects,  and  his  is  a  very  large  set.  Worse,  he  percept 
"Irisbee"  could  be  triggered  by  tion-spauaiiy  registered  groups  of  "blue"  and  "moving"  inputs. 
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There  is  a  solution  to  these  problems  if  we  assume  that,  in  general,  actual  occurrences  of  features 
will  be  sparse.  In  other  words,  in  a  given  image  there  should  not  be  two  very  similar  colors 
associated  with  different  objects.  If  there  are,  our  Hough  transform  model  will  only  be  able  to 
concentrate  on  one  of  them  at  a  time. 

The  solution  is  due  to  (Feldman  and  Ballard,  19X1],  We  will  develop  it  here  m  terms  of  our 
Hough  transform  formalism  in  stages.  First  we  formally  concatenate  parameter  spaces.  Next  we 
describe  low- resolution  concatenated  spaces.  Finally,  we  consider  low-resoluuon  parameter  spaces 
which  can  be  tuned  to  specific  parameter  values. 

Ideally,  one  could  resolve  the  spatial  registration  problem  by  concntcnaung  feature  spaces.  For 
example,  concatenating  color  space  with  mouon  space  leads  to 

=  {(^c’^m^l akc^xk^'  akm^xk^  *c'akc,bc^‘V  Wa!c,tW-''rrJ 
where  elements  in  the  expanded  space  (bt,bm)  €  BcxBm  are  only  included  if  the  input  features  are 
spatially  registered,  i.c.,  while  this  is  simply  described  :n  symbolic  form,  it  is  also  impractical  since 
die  parameter  spaces  for  the  combined -modality  elements  are  unpractically  large.  A  partial  solution 
to  the  size  problem  is  to  decrease  the  number  of  parameter  nodes.  Let  bc'  bm'  be  values  for  color 
and  motion  parameters  respecuvely  in  the  low-resolution  spaces.  Then  the  low  resolution  Hough 
transform  is  given  by 

Bk  =  {(b^HK^.a^x,,).  fc(akc.bc-X_Ac.  fia^. ' KAm}  (5.1) 
where  the  bounds  Ac  and  Am  are  larger  to  account  for  the  lower- resolution  in  parameter  space.  The 
grain  of  the  low-resolution  space  can  always  be  chosen  to  make  the  transform  practical  in  terms  of 
space.  However,  now  groups  of  parameters  that  are  sufficiently  similar  may  be  transformed  inio  the 
same  parameter  node  via  Eq.  (5.1).  To  resolve  this  problem  we  use  a  two-uered  transform, 
consisting  of  high- resolution  single-modality  transforms  and  low-resolution  mulu -modality 
transforms.  Using  die  single-modality  transforms,  we  select  maxima  lbc*}  and  (bm*)  such  that 

bc*  =  maxbc{bc  €  bc'  ±  .5AJ 
and 

bm*  =  maxbm<bm  €  bm'  *  -5Am}- 
These  values  are  then  used  to  tune  the  low-resolution  Hough  transform,  i.e., 

Hk  =  {(ly,bm')|akc.  akm.  fc(ak,bc*)<A,.  fm(ak,bm*)<Ac’- 

Thus  the  low  resolution  transform  can  be  tuned  to  count  only  a  stibset  of  the  high  resoluuon 
parameter  nodes.  The  drawback  of  dus  technique  is  that  it  can  only  respond  to  a  single  value  of 
(bc,bm)  in  each  range  {bc  ±  .5AC,  bm  ±  .5Am}.  Thus  either  the  high  confidence  parameter  nodes 
must  be  sufficiently  sparse,  or  only  one  of  the  confusion  classes  can  be  examined  at  any  one  ume. 
This  disadvantage  is  outweighed  by  being  able  to  detect  spaually-registered  features  and  thus 
circumvent  the  more  severe  problem  discussed  earlier. 

6.  Tight  Coupling 

Most  of  the  previous  examples  imply  that  the  various  Hough  transforms  are  relauveiy 
independent.  That  is,  once  the  intrinsic  images  are  computed,  the  transforms  can  be  computed.  I  he 
general  case  is  that  this  is  not  true;  the  intrinsic  image  contains  global  parameters  which  must  be 
computed  using  Hough  transforms.  Since  the  Hough  transform  required  an  intrinsic  image  it  might 
seem  that  needier  could  be  computed.  In  fact,  both  die  Hough  transform  and  die  intrinsic  images 
can  be  computed  by  incorporaung  die  Hough  transforms  into  the  parallel-iterative  scheme  used  to 
compute  the  intrinsic  images.  If  die  combined  problem  :s  well-eondiuoned:  1)  the  partial  result  for 
die  intrinsic  image  will  be  sufficient  to  produce  a  partial  result  for  the  Hough  transform,  and  vice 
versa;  and  2)  this  process  of  using  parunl  results  in  a  parallel-iterative  manner  will  converge.  We 
term  this  interdependence  tight  coupling  and  illustrate  it  with  two  examples. 
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In  the  first  example,  we  show  how  a  surface  orientation  intrinsic  image  can  be  computed  from 
intensity  information.  Tins  example  seems  paradoxica.  at  first  since  to  compute  surface  orientation 
one  must  know  the  locauon  of  the  source  of  liiuminauon  and  vice  versa.  We  show  how  both  these 
compuuiuons  can  be  conducted  simultaneously  with  the  parual  result  for  the  surface  orientauon 
helping  the  illuminauon  angle  determinauon,  and  the  partial  result  for  the  illumination  angle- 
helping  the  surface  orientauon  determipauon.  The  illumination  angle  is  determined  by  a  Hough 
transform. 

In  the  second  example,  we  show  that  a  three-dimensional  flow  field  can  be  segmented  into 
groups  of  vectors  that  represent  general  rigid  body  mouon.  The  problem  here  is  that  an  individual 
field  vector  v(x)  is  an  unknown  sum  of  rotational  and  translational  components,  i.e.,  v(x)  =  vj^(x) 
+  v-y(x).  These  components  can  only  be  determined  by  knowing  global  rigid  body  mouon 
parameters.  However,  these  parameters  can  be  determined  only  if  v(x)  is  pnruuoned  into  vpU)  and 
v-Ux).  As  in  the  earlier  example,  this  problem  can  he  resolved  by  a  parallel-iterative  scheme  which 
computes  both  die  global  parameters  and  the  velocity-field  decompovuon  simultaneously. 

Rather  than  being  isolated  examples,  tight  coupling  is  believed  to  be  the  general  case. 
Extending  die  scope  of  die  parallel-iterauve  computauon  is  the  general  soluuon. 

6.1  Shape  from  Shading  by  Relaxation 

Given  the  orientauon  of  a  surface  with  respect  to  a  viewer,  its  reflectance  properties  and  the 
location  of  a  single  light  source,  that  die  brightness  at  a  point  of  die  viewer's  retina  can  be 
determined.  That  is,  the  reflectance  function  R(U,q>Mv 7>s).  where  0,<p  and  Oyrp^  are  onentauons  of 
die  surface  and  source  respectively,  allows  us  to  determine  I(x.y),  the  ..".tensity  m  terms  of  retina! 
coordinates  |Horn  and  Sjoberg,  !97X|.  The  form  of  R  is  assumed  to  be  known.  However,  the 
perceptual  problem  is  the  reverse:  given  I(x.y)  and  R(.„),  determine  ff(x,y),cp(x,y)  and  0y 71  v 

In  general,  the  problem  of  deriving  #(x,y)<p(x,y)  and  O^cp^  is  underdetermmeri.  However, 
ikeuchi  [19S0]  showed  dint  the  surface  could  be  determined  locally  once  0s<ps  was  specified.  I'his 
mediod  has  been  extended  [Hallard,  19S ibj  to  the  case  where  0s<ps  .s  initially  unknown. 

The  algorithm  is  outlined  as  follows.  For  a  single  light  source,  the  intensity  at  a  point  on  a 
reuna  can  be  described  m  terms  of  the  orientauon  of  the  normal  of  the  corresponding  surface  point 
and  die  surface  orientation.  That  is,  m  spherical  notauon, 

I(x,y)  =  R(/J,<p,(Js,ips) 

where  the  angles  0  and  <p  are  functions  of  x  and  y.  Now  by  minimizing  (I-R)2  and  appending  a 
smoodiness  constraint  on  0  and  tp  we  have  (Ikeuchi,  1980]  an  expression  for  the  load  error  (if  the 
esumate  for  0  and  <p  is  unreliable)  as  follows: 

E(x.y)  =  (I-R)2  +  \((V2f/)2+(V2<p)2) 

where  X  is  a  Lagrange  multiplier.  For  a  minimum,  E y  and  =  0.  Skipping  some  steps,  this  leads 
to 

<P<x.y)  =  <Pave^x,y^+T(x’y)R<p 

0(x.  y)  =  tfaV(.(x,y)  +  T(x,y)Rfl 

where  <pave(x,y)  is  a  local  average  and 

T(x.y)  =  (1/161)(1-R) 

In  solving  these  equauons,  we  assume  0%  and  <ps  are  known.  An  iterative  method  is  used  where  the 
<pave  and  tfave  are  calculated  from  a  previous  iteration. 
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To  calculate  0S  and  <ps,  we  assume  0  and  <p  are  kjiown  and  use  a  Hough  technique,  first  we 
form  an  array  H!fls,<ps|  of  possible  values  of  and  rps  initialized  to  zero.  Now  we  can  solve  die 
reflectance  equation  for  qp^.  The  Hough  technique  works  as  follows.  For  each  surface  element  0,<p, 
and  for  each  0^  we  calculate  rps  and  increment  n|0s,<psi,  i.e„  H|0j.cpsl  :=  H|0s,tpsl  + 1-  After  all 
surface  elements  have  been  processed,  the  maximum  value  of  C  corresponds  to  the  locauon  of  the 
point  source.  In  (Ballard,  19Slbj  it  is  shown  that  calculauon  of  the  source  locauon  can  proceed  >n 
parallel  with  that  of  0(x,y)  and  <p(x,y)  and  that  the  two  calculations  will  converge. 

Results  for  the  one-dimensional  cas<?  are  shown  in  Figure  11  for  the  case  of  a  small  surface 
"bubble."  Figure  11  shows  the  surface  convergence,  as  well  as  die  convergence  of  the  illumination 
angle. 


Figure  11:  (a)  Shading  (top  left  curve). 

(b)  Surface  convergence  (colored  points  immediately  below  (a)). 

(c)  llluminauon  angle  Hough  transform  (bottom  left). 

(d)  llluminauon  angle  convergence  (upper  right). 

It  is  important  to  remember  that  the  boundary  condiuons  in  this  problem  have  been  provided  a 
priori ;  in  this  case  they  are  the  orientation  of  the  surface  at  the  boundary  of  the  bubo.c.  Generally, 
these  will  have  to  be  determined  by  multiple  intrinsic  images  rclaxauons.  as  menuoned  in  Secuon  2. 

6.2  3-D  Rigid  Body  Motion 

The  general  motion  of  a  rigid  body  can  be  described  by  eight  parameters:  three  for 
translational  velocity  vp;  three  for  angular  velocity  Q;  and  two  tor  cue  locauon  of  the  axis  of 
rotauon  r.  We  describe  the  detection  of  rigid  body  motion  in  three  parts,  each  of  which  uses  Hough 
transforms.  First,  we  show  how  to  detec'  pure  translation  (vy).  Next  we  show  how  to  detect  pure 
rotation  ((2,r).  Finally,  we  show  that  a  3-d  How  vector  can  be  iteratively  decomposed  into  a 
translational  component  and.  a  rotational  component.  These  components  are  described  by  the 
parameters  (vp,  12,  r). 

Pure  Translational  Motion 

This  case  is  very  simple.  If  a  rigid  body  is  translating  with  velocity  vj,  then  a  point  on  the 
body  at  location  x  will  have  velocity  v(x)  =  vT.  To  detect  this  take  the  Hough  transform  given  by 
<(x,  v(x)),  (v-j),  (v(x)-vT=0)>.  The  maximum  value  m  H(v.p)  will  correspond  to  the  translational 
velocity. 

Pure  Rotational  Motion 

In  the  case  of  pure  rigid-body  mouon,  each  point  on  an  axis  in  space  such  that 

v(x)  =  I2xp(x).  (6-1) 

where  v,  Q,  and  r  are  all  orthogonal  and  p(x)  is  a  vector  from  the  point  x  to  the  axis  of  rotation 
such  that 

p(flxv)  =  0. 

That  is,  p  is  defined  so  as  to  be  perpendicular  to  12  and  v. 

One  problem  is  to  specify  the  axis  of  rotauon.  This  is  done  using  a  vector  r  which  is  the 

smallest  vector  from  the  origin  to  die  rotation  axis  (see  Figure  12). 


Figure  12. 


The  pure  rotauon  case  involves  five  parameters:  three  for  the  vector  12  anti  two  to  specify  the 
axis  of  rotauon.  A  standard  Hough  technique  would  involve  a  transformation  from  (x,  v(x))  to 
(12, r')  using  Hq.  (6.1).  Only  a  vector  r*  equal  to  any  two  components  of  r  is  necessary  since  12  r  = 
0.  However,  a  five-dimensional  space  is  large,  thus  we  are  motivated  to  decompose  the  parameter 
space  (12, r)  into  two,  smaller  spaces  ( Ballard  and  Sabbah,  198 1 J.  One  space  is  composed  of  two 
components  wx  and  Wy  of  a  unit  vector  u>  which  defines  the  direcuon  of  12.  The  other  is  composed 
of  the  magnitude  of  12  and  two  components  of  r. 

Since  must  be  perpendicular  to  v, 
u  ■  v  =  0. 

Furthermore,  M  =  1.  Combining  these  two  equauons  leads  to 

“xvx  +  wyvy  +  "  wx^  *  ^y^  )vz  =  (6.2) 

which  is  a  quadrauc  equation  in  unknowns  wx  and  <oy.  Thus  the  direction  of  the  rotauon  vector 
may  be  found  from  the  Hough  transform 

<(v(x)),  (wx,wy).  (Fq.  (6.2))>. 

Once  u>  is  known,  it  can  be  used  in  the  following  series  of  equauons.  If  |12|  is  the  magnitude  of 
the  rotauon  vector,  the  vector  s  given  by 

s  =  x  -  wxv/|12| 

is  on  the  rotauon  axis.  Furthermore,  r  is  given  by 
r  =  s  -  (s  •  <j)u 

so  that 

r  =  (x  -  wxv/H2j)  -  (x  ■  w)cj.  (6.?) 

This  equauon  can  be  used  to  determine  the  first  two  components  of  r  given  a  value  for  |12|.  Thus 
we  can  determine  112!  and  v  from  the  following  Hough  transform: 

<(x,  v(x).u)  (rx,  ry,  1121)  (F.q.  (6.3))>. 

General  Rigid  Body  Motion 

Finally,  suppose  the  motion  is  completely  general  so  that 
v(x)  =  v-j'(x)  +  S2-xp(x). 

Since  only  v(x)  can  be  measured,  how  can  one  determine  how  much  is  translational  velocity  and 
how  much  is  roumonal  velocity'7  One  possibility  is  to  dynamically  paruuon  the  velocity  into  two 
components  v-j(x)  and  v^(x)  which  give  the  most  consistent  global  parameters  (v-j .  (2  and  r).  This 
would  work  as  follows: 

Step  0.  Assume  vq(x)  =  v(x). 

Step  1.  Use  the  Hough  transforms  to  esumate  (S2,  r). 

Step  2.  Use  ($2,  r)  to  determine  v^(x). 

Step  3.  Compute  v-p(x)  =  v(x)  -  vq(x). 

Step  4.  Use  the  Hough  transform  to  esumate  vj. 

Step  5.  Compute  vq(x)  =  v(x)  -  vj(x). 

Step  6.  If  vq(x)  has  not  converged,  go  to  Step  1. 
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7.  Discussion 

The  key  ideas  of  this  paper  are  summarized  in  the  introduction.  Here  we  mention  other  ideas 
which  do  not  lit  easily  under  any  one  of  the  previous  headings. 

1)  The  Intrinsic  Image/Feature  Space  Duality.  Hy  distinguishing  between  image  fields  and 
image  features  we  know  when  relaxation  is  the  more  important  tool  and  when  the  Hough  transform 
is  more  important. 

.?)  Unit/value.  Hy  reducing  die  underlying  primitives  to  units  of  extreme  simplicity  we  can 
algorithmically  determine  the  connection  patterns  to  represent  m-ary  relauons. 

3)  Massive  Parallelism.  Hy  assuming  the  availability  of  massive  parallel  computauon,  we  reduce 
the  need  for  sequential  processing  to  more  essential  cases.  For  example,  we  use  sequenual 
processing  in  Secucn  5  to  resolve  real  ambiguiues  in  the  input. 

4)  Extensibility.  The  representation  is  very  general,  being  m-ary  consistency  relauons,  and  can 
be  extended  to  odier  domains  besides-;  vision,  to  arbitrary  levels  of  abstracuon  within  vision. 
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